{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Stock Data\n",
    "Now that you've had exposure to time series data, let's look at bringing stock prices into Pandas.\n",
    "## Reading in Data\n",
    "Your dataset can come in a variety of different formats. The most common format is the [CSV](https://en.wikipedia.org/wiki/Comma-separated_values). We'll use the \"prices.csv\" file as an example csv file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('prices.csv', 'r') as file:\n",
    "    prices = file.read()\n",
    "    \n",
    "print(prices)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data provider will provide you with information for each field in the CSV. This csv has the fields ticker, date, open, high, low, close, volume, adj_close, adj_volume in that order. That means, the first line in the CSV has the following data:\n",
    "\n",
    "- ticker: ABC\n",
    "- date: 2017-09-05\n",
    "- open: 163.09\n",
    "- high: 164.24\n",
    "- low: 160.21\n",
    "- close: 162.63\n",
    "- volume: 29417590.0\n",
    "- adj_close: 162.49\n",
    "- adj_volume: 29414672.0\n",
    "\n",
    "Let's move this data into a DataFrame. For this, we'll need to use the [`pd.read_csv`](https://pandas.pydata.org/pandas-docs/version/0.21.1/generated/pandas.read_csv.html) function. This allows you generate a DataFrame from CSV data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "price_df = pd.read_csv('prices.csv')\n",
    "\n",
    "price_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That generated a DataFrame using the CSV, but assumed the first row contains the field names. We'll have to supply the function's parameter `names` with a list of fiels names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df = pd.read_csv('prices.csv', names=['ticker', 'date', 'open', 'high', 'low',\n",
    "                                             'close', 'volume', 'adj_close', 'adj_volume'])\n",
    "\n",
    "price_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DataFrame Calculations\n",
    "Now that we have the data in a DataFrame, we can start to do calculations on it. Let's find out the median value for each stock using the [`DataFrame.median`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.median.html) function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df.median()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's not right. Those are the median values for the whole stock universe. We'll use the [`DataFrame.groupby`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.groupby.html) function to get mean for each stock."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df.groupby('ticker').median()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's what we're looking for! However, we don't want to run the `groupby` function each time we make an operation. We could save the GroupBy object by doing `price_df_ticker_groups = price_df.groupby('ticker')`. This limits us to the operations of GroupBy objects. There's the [`GroupBy.apply`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.core.groupby.GroupBy.apply.html), but then we lose out on performance. The true problem is the way the data is represented."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df.iloc[:16]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Can you spot our problem? Take a moment to see if you can find it.\n",
    "\n",
    "The problem is between lines [6, 7] and  [13, 14]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df.iloc[[6, 7, 13, 14]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Data for all the tickers are stacked. We're representing 3 dimensional data in 2 dimensions. This was solved using Panda's Panels, which is [deprecated](https://pandas.pydata.org/pandas-docs/version/0.21.0/dsintro.html#deprecate-panel). The Pandas documentation recommends we use either [MultiIndex](https://pandas.pydata.org/pandas-docs/version/0.21/advanced.html) or [xarray](http://xarray.pydata.org/en/stable/). [MultiIndex](https://pandas.pydata.org/pandas-docs/version/0.21/advanced.html) still doesn't solve our problem, since the data is still represented in 2 dimensions. [xarray](http://xarray.pydata.org/en/stable/) is able to store 3 dimensional data, but Finance uses Pandas, so we'll stick with this library. After you finish this program, I recommend you check out [xarray](http://xarray.pydata.org/en/stable/).\n",
    "\n",
    "So, how do we use our 3-dimensional data with Pandas? We can split each 3rd dimension into it's own 2 dimension DataFrame. Let's take this array as an example:\n",
    "```\n",
    "[\n",
    "    [\n",
    "        [ 0,  1],\n",
    "        [ 2,  3],\n",
    "        [ 4,  5]\n",
    "    ],[\n",
    "        [ 6,  7],\n",
    "        [ 8,  9],\n",
    "        [10, 11]\n",
    "    ],[\n",
    "        [12, 13],\n",
    "        [14, 15],\n",
    "        [16, 17]\n",
    "    ],[\n",
    "        [18, 19],\n",
    "        [20, 21],\n",
    "        [22, 23]\n",
    "    ]\n",
    "]    \n",
    "```\n",
    "We want to split it into these two 2d arrays:\n",
    "```\n",
    "[\n",
    "    [0, 2, 4],\n",
    "    [6, 8, 10],\n",
    "    [12, 14, 16],\n",
    "    [18, 20, 22]\n",
    "] \n",
    "[\n",
    "    [1, 3, 5],\n",
    "    [7, 8, 11],\n",
    "    [13, 15, 17],\n",
    "    [19, 21, 23]\n",
    "] \n",
    "```\n",
    "In our case, our third dimensions are \"open\", \"high\", \"low\", \"close\", \"volume\", \"adj_close\", and  \"adj_volume\". We'll use the [`DataFrame.pivot`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.pivot.html) function to generate these DataFrames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "open_prices = price_df.pivot(index='date', columns='ticker', values='open')\n",
    "high_prices = price_df.pivot(index='date', columns='ticker', values='high')\n",
    "low_prices = price_df.pivot(index='date', columns='ticker', values='low')\n",
    "close_prices = price_df.pivot(index='date', columns='ticker', values='close')\n",
    "volume = price_df.pivot(index='date', columns='ticker', values='volume')\n",
    "adj_close_prices = price_df.pivot(index='date', columns='ticker', values='adj_close')\n",
    "adj_volume = price_df.pivot(index='date', columns='ticker', values='adj_volume')\n",
    "\n",
    "open_prices"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That gives you DataFrames for all the open, high low, etc.. Now, what we have been waiting for.. The mean for each ticker."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "open_prices.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also get the mean for each date by doing a transpose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "open_prices.T.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It doesn't matter whether date is the index and tickers are the colums or the other way around. It's always a transpose away. Since we're going to do a lot of operations across dates, we will stick with date as the index and tickers as the colums throughtout this program.\n",
    "## Quiz\n",
    "Let's see if you can apply what you learned. Implment the `csv_to_close` function to take in a filepath, `csv_filename`, and output the close 2d array. You can assume the CSV file used by `csv_to_close` has the same field names as \"prices.csv\" and in the same order.\n",
    "\n",
    "To help with your implemention of quizzes, we provide you with unit tests to test your function implemention. For this quiz, we'll be using the function `test_csv_to_close` in the `quiz_tests` module to test `csv_to_close`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import quiz_tests\n",
    "\n",
    "\n",
    "def csv_to_close(csv_filepath, field_names):\n",
    "    \"\"\"Reads in data from a csv file and produces a DataFrame with close data.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    csv_filepath : str\n",
    "        The name of the csv file to read\n",
    "    field_names : list of str\n",
    "        The field names of the field in the csv file\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    close : DataFrame\n",
    "        Close prices for each ticker and date\n",
    "    \"\"\"\n",
    "    \n",
    "    # TODO: Implement Function\n",
    "    \n",
    "    return None\n",
    "\n",
    "\n",
    "quiz_tests.test_csv_to_close(csv_to_close)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz Solution\n",
    "If you're having trouble, you can check out the quiz solution [here](stock_data_solution.ipynb)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
